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Abstract 


In  this  paper,  we  suggest  a  generalized  Gauss-Seidel  approach  to  sparse  linear  and 
nonlinear  least-squares  problems.  The  algorithm,  closely  related  to  one  given  by  Elfving 
(1980),  uses  the  work  of  Curtis,  Powell,  and  Reid  (1974)  as  extended  by  Coleman  and 
Mor6  (1983)  to  divide  the  variables  into  nondisjoint  groups  of  structurally  orthogonal 
columns  and  then  projects  the  updated  residual  into  each  column  subspace  of  the  Jacobian 
in  turn.  In  the  linear  case,  this -procedure  can  be  viewed  as  an  alternate  ordering  of  the 
variables  in  the  Gauss-Seidel  method.  Preliminary  tests  indicate  that  this  leads  quickly  to 
cheap  solutions  of  limited  accuracy  for  linear  problems,  and  that  this  approach  is  promis¬ 
ing  for  an  inexact  Gauss-Newton  analog  of  the  inexact  Newton  approach  of  Dembo,  Eisen- 
stat,  and  Steihaug  (1981). 


Key  words 


Sparse  nonlinear  least  squares,  inexact  Gauss-Newton,  finite-difference  Jacobians,  SOR 
iteration. 
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1.  Introduction 

The  purpose  of  this  paper  is  to  suggest  a  generalized  group  Gauss-Seidel  approach  to 
sparse  least-squares  problems  that  appears  in  preliminary  tests  to  be  useful  in  obtaining 
cheap  solutions  of  limited  accuracy.  For  nonlinear  problems,  the  approach  combines  the 
work  on  linearization  of  Coleman  and  Mor6  (1982),  (1983)  developing  ideas  of  Curtis, 

Powell,  and  Reid  (1974)  with  the  work  on  inexact  Newton  methods  of  Dembo,  Eisenstat, 
and  Steihaug  (1982)  and  Steihaug  (1980).  This  amalgam  leads  here  to  an  especially  con¬ 
venient  implementation  of  the  inexact  Gauss-Newton  method  suggested  in  Section  3. 

The  basic  idea  is  simple.  If  we  apply  the  methods  of  estimating  sparse  Jacobian 
matrices  to  group  the  columns  of  a  coefficient  matrix,  then  each  group  of  columns  is  mutu¬ 
ally  orthogonal  in  their  zero/nonzero  structure  regardless  of  the  particular  values  of  the 
nonzero  entries.  This  has  the  advantage  of  leading  for  each  group  of  columns  to  smaller 
cheaper  least-squares  problems  unaffected  by  conditioning.  The  idea  is  to  cycle  through 
these  groups  solving  each  time  with  the  right-hand  side  updated  as  in  iterative  refinement. 

These  graph-theoretic  algorithms  actually  partition  the  columns  of  the  coefficient 
matrix  A  into  disjoint  groups.  Notice  that  there  is  no  reason  to  want  nonintersecting 
groups  of  columns,  and  it  will  be  interesting  in  our  context  to  consider  overlap  between 
groups.  We  report  some  experiments  in  Section  4  with  a  simple  heuristic  scheme  for  over¬ 
lapping  orthogonal  sets  of  columns  and  for  groups  whose  normal  matrices  are  tridiagonal. 

In  the  nonlinear  case,  there  seem  to  be  diagnostic  advantages  to  maximal  overlap  in 
the  computation  of  finite-difference  Jacobians  for  inaccurate  residuals.  This  is  a  poten¬ 
tially  useful  point  that  has  not  to  our  knowledge  been  pointed  out  and  applies  equally  in 
the  solution  of  square  systems  of  nonlinear  equations.  It  would  require  no  additional  vector 
function  evaluations  to  use  a  different  perturbation  for  a  given  variable  or  column  for  each 
group  to  which  it  belongs.  These  values  could  then  be  used  heuristically  to  refine  the  par¬ 
tial  derivative  estimate  and  estimate  its  accuracy.  In  Section  4,  we  give  some  evidence  to 
support  the  obvious  conjecture  that  overlapping  orthogonal  sets  of  columns  is  helpful  in 
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the  nonlinear  case;  although  it  does  not  seem  worth  the  extra  arithmetic  in  the  linear  case. 

It  is  in  order  to  disclaim  that  we  bypass  the  effects  of  ill-conditioning  by  using  any  of 
the  grouping  strategies  suggested  here;  although,  we  may  make  less  use  than  factorization 
techniques  of  extended  precision  in  obtaining  a  given  accuracy.  It  is  true  that  some  of  the 
subproblems  we  solve  are  unaffected  by  conditioning  because  they  are  diagonal,  but  a  prob¬ 
lem  with  two  nearly-dependent  columns  shows  what  happens  in  the  ill-conditioned  case  and 
illustrates  the  algorithm  in  residual  space.  We  first  project  the  right-hand  side,  say  r°,  onto 
the  subspace  spanned  by  the  first  column  dj.  Now  we  subtract  from  r°  the  part  of  the  resi¬ 
dual  accounted  for  by  this  projection,  and  then  project  the  new  residual  r1  onto  a2.  The 
algorithm  repeats  the  process  and  the  reader  will  see  that  in  order  to  reduce  to  an  accept¬ 
able  level  the  part  of  the  residual  that  can  be  accounted  for  by  a  linear  combination  of  aj 
and  a2,  we  will  need  more  iterations  for  a  small  angle  between  a}  and  a?.  If  were  orthog¬ 
onal  to  a2,  then  one  projection  onto  each  would  suffice  to  reduce  the  residual  as  much  as 
possible.  This  is  exactly  the  feature  that  we  are  exploiting  within  each  column  group. 

Since  our  algorithm  is  based  on  convenient  column  groupings,  the  reader  might  be 
tempted  to  think  of  this  as  a  disadvantage  over  row-oriented  schemes  for  large  least- 
squares  problems.  This  is  not  the  case  either  here  or  in  the  nonlinear  equations  problem 
for  that  matter.  See  Coleman  and  Mor6  (1983),  p.208,  for  a  discussion  that  just  the  oppo¬ 
site  is  likely  to  be  the  case.  There  are  a  host  of  schemes  that  can  be  used  to  adapt  the 
ideas  here  to  any  segmentation  of  the  problem  into  groups  of  rows  with  the  elements  in 
each  segment  stored  columnwise  for  which  the  corresponding  rows  of  the  residuals  can  be 
calculated  separately. 

In  Section  2,  we  will  present  the  algorithm  for  the  linear  problem  and  show  that  it 
generalizes  the  group  SOR  method  in  the  sense  that  it  is  that  method  for  the  normal  equa¬ 
tions  of  an  extended  linear  least-squares  problem.  This  observation  will  allow  an  easy  con¬ 
vergence  proof,  even  in  the  singular  case,  using  results  from  Keller  (1965).  Section  3  is  a 
discussion  of  the  nonlinear  least-squares  problem  that  combines  the  algorithm  of  Section  2 
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applied  to  the  linearized  problem  with  the  inexact  Gauss-Newton  approach  as  a  guide  to 
the  accuracy  required  in  solving  the  linearized  problems.  Section  4  presents  some  numeri¬ 
cal  results  for  several  heuristic  column  grouping  schemes  as  well  as  for  the  associated  ord¬ 
erings  of  the  columns  for  point  SOR. 


2.  The  Algorithm  for  the  Linear  Least  Squares  Problem 

Let  4  be  an  m  by  n  real  matrix,  m^n,  6eRm,  and  consider  the  least-squares  problem 

min  II  4x  —  b  ||2.  (2.1) 

In  order  to  illustrate  our  algorithm,  assume  that  the  columns  of  A  are  divided  into  g  groups 
^i,  Az ,  .  .  .  ,  Ag,  where  4j  is  a  m  by  n*  submatrix  and  4*  may  share  columns  with  Ay  Let 
Xi€Rni,  x^R™2,  •  •  •  .  z9€Rni.  The  least-squares  problem  (2.1)  can  now  be  written  as: 

min  j||4iXi+42x2+...+49x9-6||2:  XieR*4,  i=l,2,...,p(.  (2.2a) 

Note  that  (2.2a)  is  really  an  m  by  n=nl+nz+...+ng  least-squares  problem  which  we  will 
denote  using  boldface  type  as: 

min  II  Ax  -  6  ||2  ,  (2  2b) 

where  A=(4il42|...l49)  is  an  mxn  matrix  and  x  =  (x1,x2,  .  .  .  ,xg)T  is  a  n  vector.  Clearly,  A 
has  exactly  the  same  set  of  distinct  columns  as  A  divided  into  the  same  groups,  but  it  will 
be  useful  that  we  can  ignore  overlaps  if  we  view  the  4t  as  column  groups  of  A.  It  will  also  be 
convenient  to  have  the  notation  x^€Rn  and  x^eR”  to  denote  respectively  the  vectors 
obtained  by  starting  with  the  n  or  n  zero  vectors  and  placing  the  nonzero  entries  of  x*  in 
the  positions  corresponding  to  the  column  indices  in  A  or  A  of  4*.  Thus,  given  either 
x€R“,  or  Xj,  x2 ,  .  .  .  ,  xg  with  each  XiGR"*,  we  can  define  x€Rn,  x=xi+x^+  -  •  •  +x^  and 
write 


or 


4x=4xi+4x2+  •  •  •  +4x9=41Xj+42x2+  •  •  •  +AgXg 
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X=X!+I2+  •  •  •  +XS,  Ax=Ax!+Ax2+  •  •  •  +kxg=Alxx+Azxz+  •  •  •  +AgXg. 

Suppose  we  have  an  approximation  z*  to  a  solution  x  to  (2.1),  and  we  divide  z*  into 

,  x% , .  .  . ,  z*  as  above.  (This  division  will  not  be  unique  for  components  corresponding  to 

column  overlaps.)  Then  (2.2)  suggests  the  following  successive  replacements  iteration. 

FOR  i=  1 ,2 . g  DO 

Solve  for  z*+1  :  minf  U±Atf+1  +  £  Ajxf-bWz  :  z^eR^. 

;'-l  j'-i+l 

This  is  a  method  of  projections  [Householder  (1964)],  and  for  the  special  case  of  non¬ 
overlapping  column  groups,  i.e.,  for  formulation  (2.2b),  this  particular  iteration  was  sug¬ 
gested  by  Elfving  (1980).  We  will  see  easily  below  that  this  is  group  Gauss-Seidel  [Young 
(1971),p.438]  on  the  normal  equations  for  (2.2)  and  so  it  is  a  generalized  group  Gauss- 
Seidel  for  (2.1).  Bjork  and  Elfving  (1979)  and  Elfving  (1980)  have  pointed  out  that  in  this 
form  of  the  Gauss-Seidel  iteration,  we  do  not  need  to  explicitly  form  the  normal  equations. 

Let  s*  be  the  step  or  correction,  let  r*  =  Axk  -  b  be  the  residual,  and  notice  that 

Ai4+1  +  £  Ajxrb  =  4isfl-rfc. 

1-  2 

So  we  rewrite  the  iteration: 

FOR  i=l,2 . g  DO 

Solve  for  sf  :  min  j  114^  +  ri+(i_1)/j||  :  s^eR"*  f; 

Update  the  residual:  rJc+i/9  =  t-M*-1)/?  + 

The  new  approximate  solution  is  now 

=  4+sf.  *=i . s;  J**'  =  £xf« 

i=l 

We  complete  this  section  by  stating  the  general  algorithm  with  relaxation  factors  and 
proving  that  it  converges.  In  Section  4,  we  will  discuss  termination  criteria  and  storage 
requirements.  We  will  want  to  assume  that  each  has  full  rank.  This  can  be  done  without 
any  loss  of  generality,  since  any  linearily  dependent  group  of  columns  can  be  split  into 
smaller  groups  of  linearily  independent  columns  by  making  each  column  a  group  by  itself,  if 
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necessary.  Clearly,  we  can  assume  that  there  is  no  zero  column,  since  such  a  column  can 
be  dropped  from  A  without  changing  the  least-squares  problem. 

Subdivided  into  g  groups.  (Each  A{  has  full  rank,  i=  1,2,  •  •  •  ,g.) 

Choose  x?,  i=l,  •  •  •  ,g. 

Compute  r°=dx°-6.  (Choose  0<£Ji<2,  i=l,2,  •  •  •  ,g.) 

FOR  fc=0  STEP  1  UNTIL  Convergence  DO 

FOR  i=l  STEP  1  UNTIL  g  DO 
sf  «  -(AJO-UTt**^; 
r1^  =  +  0,^; 

J+Vg  =  jk+H-XVg  +  cj.5*. 

Check  for  Convergence. 

Theorem  2.1:  Let  the  columns  of  A  be  divided  into  g  groups  dj,  •  •  •  ,  Ag  and  let  each  dt 
have  full  rank.  Let  jx*}  be  generated  by  the  above  algorithm  with  any  choices  0<cji<2,  and 
any  xJcR"4,  t=l,2,  •  •  •  ,g.  Then  {x*}  converges  to  x\  a  solution  to  the  least-squares  prob¬ 
lem  (2.1). 

Proof:  We  will  show  that  the  algorithm  is  the  group  SOR  iteration  on  the  normal  equations 
for  (2.2).  We  will  then  apply  a  result  of  Keller  (1965)  which  proves  convergence  for  the 
group  SOR  method  applied  to  positive  semi-definite  systems.  This  will  give  convergence  of 
and  hence  of  |x*(  and  of  |x*  j  for  every  i.  But  then  fx*(  converges  since  it  is  just 

iix!!- 

i-1 

We  now  state  the  result  of  Keller  (1965)  for  completeness.  Let  C  be  a  real  symmetric 
matrix  of  order  n  of  the  form 

G  =  D  +  E  +  Et  (2.3) 

and  let  W  be  any  real  nonsingular  matrix  such  that 

M  =  W~lD  +  E 

is  nonsingular.  Let  /  be  any  vector  for  which  the  system 


(2.4) 
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Gx=f  (2.5) 

has  a  solution.  Consider  the  iterative  method 

Mxk+1  =  (tf-C)x*  =  /  (2.6) 

where  x°  is  an  initial  guess  of  a  solution  of  (2.5).  The  following  lemma  is  a  part  of  Corollary 

2.1  of  Keller  (1965). 

Lemma  2.2.  Let  G  be  a  symmetric  positive  semidefinite  matrix  of  the  form  (2.3)  and  let  W 
be  nonsingular  such  that  the  matrix  N  in  (2.4)  is  nonsingular.  Let  (2.5)  have  a  solution 
and  let 

P  =  W_1D  +  (W~'D)t  -  D 

be  positive  definite.  Then  for  every  x°  the  sequence  |z*|  of  (2.6)  converges  to  a  solution  of 
(2.5). 


It  will  be  useful  to  let  G^=AjAj  be  the  (i,j)  block  of  the  n  by  n  Gram  matrix,  G= ArA. 
Define 

Xj  =  x]  +  2  otf  >=1,2,  •  •  •  ,g,  and  xk+i/g  =  . 

P-0  jm  1  jm  i 

First  we  need  that 

7 k+i/g  =  Ax*+i/g-b. 

We  will  prove  this  by  induction  on  k+i/g  in  steps  of  1  /g.  By  definition,  the  statement  is 
true  for  k+i/g= 0.  Now  assume  that  the  statement  is  true  for  some  fc+(i—  l)/g  ^  0.  Then 

jJc+i/g  -  )/g  +  Wij4.sJ  =  Axk+^i~^/g  -  b  +  07*4^ 

=  A[ x*+<<-1>/»  +  -  6  =  Axk+i/g-b. 

Now, 

-  x*]  =  c JiCus*  =  -cji4f[>xfc+(i-1)/»  -  6] 

=  -  ui[ +  ^GijXj  -  4ib], 

7=1  M 


which  becomes: 
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G*Xi+1  +  cJi[2%^+1  +  i,  GijXj]  +  («i-l)Ci iXki  =  cjj4[6. 

;-l  j-i+1 

When  cji=cj,i=lI2,...,g  this  is  the  standard  form  given  on  p.  438  of  Young  (1971)  of  the 
group  SOR  method  applied  to  Gx=ATb.  To  apply  Keller’s  result,  we  write  G=D+E+ET 
where  D=(Cii)  is  the  nxn  block  diagonal  of  G,  and  £’=(Cij)  is  the  nxn  block  strict  lower 
triangle  of  G.  Let  W  be  the  nxn  block  diagonal  matrix  whose  ith  block  is  a;*  times  the 
74x71*  identity  matrix.  Now  we  rewrite  the  iteration  as 

Dxk+l  +  W[Exk+l  +  ETxk]  +  ( W-I)Dxk  =  WATb 

(D+WE)xk+1  +  [W(D+ET)-D]xk  =  WATb. 

Since  no  Wj= 0,  we  can  multiply  through  by  IT-1,  set  M=W~1D+E,  and  obtain 

tfxfc+1  +  (G-N)xk  =  ATb. 

In  order  to  complete  the  proof,  we  only  need  that  N  is  nonsingular  and  that 
(2 W~X-I)D  is  positive  definite.  Observe  that,  since  each  Ai  has  full  rank,  D  is  positive 
definite.  It  follows  that  W~1D  is  nonsingular  and  so  N  is  also.  In  fact,  W~l  is  blocked  into 
constant  diagonal  matrices  so  that  0<Ui<2  ensures  that  (2W~1-I)D  is  symmetric  and  posi¬ 
tive  definite. 


3.  The  Inexact  Gauss-Newton  Approach 

We  now  consider  the  algorithm  for  the  nonlinear  least  squares  problem.  Let 
F:  Rn-»  Rm,  to  ^  n,  and  F  =  (F{ . FJT, 

and  define 

?(*)- 

Then  the  nonlinear  least  squares  problem  is  to  find  x  so  that  for  some  £>0, 

(, p(x ')  ^  <p(x )  for  all  I  lx — x*  II  <  £.  (3.1) 

The  simultaneous  solution  of  n  nonlinear  equations  in  n  unknowns  may  be  viewed  as  solving 

(3.1)  where  m=n.  For  small-residual  sparse  problems,  the  Gauss-Newton  method  is  very 

attractive.  It  starts  with  x°  and  generates  a  sequence  of  iterates  j  xk  (  as  follows: 


B 


FOR  Jk=0  STEP  1  UNTIL  Convergence  DO 

Solve  for  s*:  min)  II  F'(xk)s  +  F(x*)|l2  :  s  £  Rn  |  ;  (3.2) 

Set  xk+1  =  x*  +  s*. 

If  mxn  is  large,  solving  the  linearized  problem  (3.2)  may  require  the  techniques  men¬ 
tioned  above  for  (2.1). 

If  we  use  an  iterative  method  to  solve  the  linearized  problem,  then  it  is  important  to 
know  how  accurately  it  must  be  solved  in  order  to  not  impede  convergence.  We  define  the 
Inexact  Gauss-Newton  algorithm  for  a  given  real  non-negative  sequence  and  starting 
point  x°  as  follows: 

FOR  Jfc=0  STEP  1  UNTIL  Convergence  DO 

Find  some  approximate  minimizer  s*  of  (3.2)  so  that 

s  ,here  ** =  fV)+f'W:  <3-3) 

Set  xk+1  —  x*+  s*. 

where  INI  is  a  vector  norm  or  a  consistent  matrix  norm.  Let  x  be  a  solution  of  (3.1)  and 
define  the  norm  IlylU  =  \\F'(x')TF'(x')y\\.  We  will  assume  in  this  section  that: 

A.l.  F  is  continuously  differentiable  in  an  open  neighborhood  0  of  x’; 

A.2.  F'(x)rF{x')  =  0; 

A.3.  F'( x*)  has  full  rank; 

A.4.  There  exists  7^0  so  that  for  x£fl  , 

\\(F'(x)-F'(x))TF(x)\\  £  7II1-1II. .  (3.4) 

Notice  that  A.2  is  somewhat  redundant  since  we  are  assuming  that  x  solves  (3.1).  Also,  if 
the  Jacobian  matrix  is  Lipschitz  continuous  at  x\  i.e.,  there  exists  IS  0  so  that  for  x£Q, 

\\F'(x)-F'{x)\\z  £  Lllx— x’ll2; 

then  there  exists  7^0  so  that  assumption  A.4  holds  in  the  l2  norm.  Notice  that  7=0  for 
zero-residual  problems.  The  following  theorem  relates  {77^  to  the  speed  of  convergence  of 


Theorem  3.1:  Assume  that 


7  +  77(1+7)  <  r  <  1  (3.5) 

for  7  from  (3.4)  and  0  ^  0*  ^  77  from  (3.3).  Then  there  exists  some  £> 0  for  which 

||x°— x*||.^  £  implies  that  the  sequence  of  inexact  Gauss-Newton  iterates  {  x*  £  is  defined 

and  converges  at  least  linearly  to  x  in  the  sense  that 

llxfc+1-x||.  ^  r||xt-x,||,. 

Proof:  Let 


M  =  2  max  \  ||f'(**)r||,  ||(F'(x*)r/’'(x))-1||  \  (3.6) 

and  let  <5>0  be  so  that 

(1+/J.6)  |^77[l+7+(yLi+l)d]+7+/Li<5j  <  r. 

This  can  be  done  in  view  of  (3.5).  Define 

G(x)  =  F'(x)tF'(x),  C'  ^  F'(x)tF'(x). 

Choose  £>0  so  that  if  ||x-x*||.  ^  £,  then  l|C-1(x)||  £  fx  ,  ||C*-C(x)ll  £  <5  ,  ||F'(x)r||  g  fx  , 
and  I \F(x’) -F(x) -F' (x) (x* -x) ||  ^  <5Ux — x*ll«  .  This  can  be  done  in  view  of  (3.6)  and  the 
assumptions  A.l  and  A.3.  Let  x+  be  the  new  inexact  Gauss-Newton  iterate,  i.e.  x+  satisfies 

lHa)^(x)||  "  TJ’  Where  r  =  f(l)  +  ’  I+=  *  +  & 

Consider 


C*( x+-x)  =  [(G‘-G)G~i+I]  [F'(x)Tr-  (F'(x)-F'(x))TF(x) 

+  F'(x)t(F(x)-F(x)-F'(x)(x-x))]. 


Taking  norms  yields 


|x+-x 


I.  £  [l  +  IIC(x)-1||  HC*-C(x)ll] 

X  [||.F'(x)rrl|  +  \\{F'(x)-F'(x))tF(x)\\  +  \\F'(x)t\\  II F(x)-F(x)-F'(x)(x- 
^  (l+/z<5)  |^77llF'(x)r/’(x)l|+7llx-x,||.+la(5llx-x‘l|.j  . 


Consider 
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F'(x)tF(x)  =  (F'(x)-F'(x,))tF(x)  -  F'(x)T(F'(x)-F(x)-F'(x)(xt-x)) 

-  (G(x)-G*)(x'—x)-G*(x’-x) . 

Taking  nonns, 

[\F'(x)tF(x)\\  £  yllx-ill.  +  //<5||x*-x||.  +  <5llx*-x||.  +  llx-xlk 
So 

||x+-x*||.  £  (1+yucS)  ^[l+7+(/x+l)6]+7+^(jj  ||x-x*||.  ^  r  lix-x*lk 

If  7  =  0,  then  the  inexact  Gauss-Newton  method  is  closely  related  to  the  inexact 
Newton  method  of  Dembo,  Eisenstat,  and  Steihaug  (1982)  and  the  inexact  quasi-Newton 
method  of  Steihaug  (1980).  In  the  case  when  6 0,  this  theorem  relaxes  the  Dermis 
(1977)  conditions  for  convergence  of  the  Gauss-Newton  method.  Furthermore,  if  F  is  twice 
continuously  differentiable,  then  we  can  apply  the  standard  Ostrowski  Theorem  to  the 
Gauss-Newton  method  as  in  Ortega  (1972).  Now  we  will  show  that  if  the  assumptions  in  the 
standard  Ostrowski  Theorem  are  satisfied,  then  (3.4)  holds. 

In  the  following  discussion,  let  F  be  twice  continuously  differentiable  in  an  open  neigh¬ 
borhood  0  containing  x*.  For  x  sufficiently  close  to  x\  A.3  lets  us  define 

N(x)  =  x  -  [F'(x)rF'(x)]_1r(x)rF(x). 

Then  the  derivative  of  N  exists,  is  continuous  in  a  neighborhood  of  x*  and 

r(x)  =  -  [F'(x*)rr(x*)]_1s 

where 

5  =  2A(x,)V2Fi(x) 

t=i 

(See  8.1.8  in  Ortega,  1972,  and  Dennis,  1977.)  Recall  that  the  Gauss-Newton  method  is 
x*+1  =  M(xk)  and  x*  is  a  point  of  attraction  of  the  the  Gauss-Newton  iteration  if 
p(AT(x*))  <  1  where  p(')  is  the  spectral  radius  of  the  matrix  (see  8.1.7  in  Ortega,  1972). 
Define  the  function  h: 0-»Rn  by 

h(x)  =  (F'(x)-F'(x))TF(x). 

The  assumption  that  F  is  twice  continuously  differentiable  in  an  open  neighborhood  of  t 
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and  the  assumption  A.2  give  A'(x’)  =  S.  Assume  for  some  positive  <5  we  have 
p(N'(x’))+6  g  7  <  1.  Choose  a  norm  INI  so  that  ||W'(x*)rll  ^  p(M'(x'))+6/ 2.  We  can  find 
a  vector  norm  that  is  consistent  with  the  chosen  matrix  norm,  and  choose  a  neighborhood 
of  radius  c  so  that  for  all  III -i’ll  ^  £  we  have 


||/i(x)  —  A(x*)  -  /i'(x*)(x-x*)ll  ^  llx-x’ll. 

This  can  be  done  since  h  is  continuously  differentiable.  Consider 

h(x)  =  /i'(x‘)(x-x* )  +  h(x)-h(x *)  -  h'(x’)(x-x') 
and  note  that  /i'(x’)(x-x‘)  =  -N'(x')T[F'(x')TF'(x')(x-x')].  Taking  norms  yields 


\\(F'(x)-F'(x))tF(x)\\  =  ||fc(x)||  <  ||A'(x*)(x-x*)||  +  ||A(x)-fi(x,)-/i'(a:,)(i-a:,)ll 


nr(x*)rn  +| 


llx-x’ll. 


£  (p(r(x))+<5)  llx-x’ll.  S  7  II*-*  IU 

which  shows  (3.4). 

In  the  inexact  Gauss-Newton  approach,  we  ignore  the  specific  method  we  are  using  to 
find  an  approximate  minimizer  s*  of  (3.2).  If  F'  is  sparse,  then  as  in  Curtis,  Powell,  and 
Reid  (1974),  and  Coleman  and  Mor6  (1982),  (1983),  we  may  group  the  columns  of  F'  so 
that  the  columns  in  each  group  are  mutually  orthogonal  vectors.  We  note  that  a  column 
can  be  in  several  groups.  The  columns  F'{ x*)*  in  group  i  may  be  approximated  by  finite 
differences  AF(xi)i  with  only  one  extra  value  F(xk+Vi),  where  v*  is  an  appropriate  linear 
combination  of  the  corresponding  standard  unit  vectors.  For  SieR”4,  let  si€Rn  be  con¬ 
structed  as  in  Section  2.  This  suggests  the  following  cycle  in  the  inner  loop: 
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For  given  x*,  let  r*  =  y*  =  z*; 

Inner  loop  cycle: 

FOR  i=l  STEP  1  UNTIL  y  DO 

Compute  Ak  =  F'(xk)i  or  Af^x*)*; 

Solve  for  sf:  minjlU*s*+r,:+(1~D/ff||2  :  s^R"^; 

Set  r*+i/9  =  rt+^-1)/s  + 

Set  yi+i/ *  =  y*^"1**  4  o&; 

Check  termination  (3.3). 

The  next  iterate  is  now  xi+1  =  yk+e  G  Rn  where  c  is  the  number  of  cycles,  which 
corresponds  to  terminating  the  inner  iteration  after  c  sweeps  through  each  of  the  column 
groups.  The  least-squares  problem  (3.6)  is  trivial  to  solve  when  the  columns  in  this  group 
are  mutually  orthogonal.  This  especially  convenient  way  to  group  the  columns  has  been 
discovered  independently  by  Coleman  (1984). 

If  c>l,  then  the  above  inner  loop  cycle  requires  either  recomputing  F'(x*)i  or  AF(x*)i 
when  needed,  or  storing  F'(x*)  or  AF(xk).  An  alternative  approach  recomputes  the  Jacobian 
matrix  of  one  particular  group  at  a  time  and  updates  the  nonlinear  residual.  This  would 
suggest  the  following  nonlinear  substitution  method: 

Given  x°,  compute  F(x°) 

FOR  Jfc=0  STEP  1  UNTIL  Convergence  DO 

FOR  i=l  STEP  1  UNTIL  g  DO 

Compute  4+(i-1)/j  =  F'(xi+(<-1)/?)i  or  AF(x*+(i_1)/9)i; 

Solve  for  :  min{IMi+^_1^/jsi+F(x*+^“^/s)ll2  :  s^gR71^; 

Set  xk+i/3  =  +  Ui4; 

Check  convergence. 


4.  Numerical  Results 

In  this  section,  we  describe  two  column  grouping  strategies  to  be  used  with  the  algo¬ 
rithms  given  in  Sections  2  and  3,  and  we  present  some  numerical  results  for  the  Duff  and 
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Reid  (1979)  sparse  least-squares  test  problems.  We  begin  with  a  discussion  of  the  prob¬ 
lems. 

4.1.  The  Test  Problems 

These  problems  are  specified  only  in  their  sparsity  structures  which  come  from  adjust¬ 
ment  of  survey  data  (Matrix  28  to  32  in  the  test  bed). 

Problem  1:  A  is  219  by  85  and  the  survey  pattern  is  from  Holland. 

Problem  2:  A  is  958  by  292  and  the  survey  pattern  is  from  United  Kingdom. 

Problem  3:  A  is  331  by  104  and  the  survey  pattern  is  from  Scotland. 

Problem  4:  A  is  608  by  188  and  the  survey  pattern  is  from  England. 

Problem  5:  A  is  313  by  176  and  the  survey  pattern  is  from  Sudan. 

The  specific  problems  used  here  were  found  by  generating  the  nonzero  matrix  elements 
randomly  in  the  interval  (-1,1)  and  the  components  of  a  solution  vector  x  randomly  in  the 
interval  (0,1).  The  righthand  side  b  was  found  by  computing  b-Ax.  The  nonlinear  problems 
were  found  by  replacing  x j  by  xj,  i.e.,  component  i  in  F  is 

Fi(*)  =  T,Avxi  ~K 

Thus,  our  problems  have  zero  residuals  at  the  solution.  We  approximate  all  derivatives  by 
finite  differences  in  these  tests. 

4.2.  The  Column  Grouping  Schemes 

We  have  already  mentioned  that  one  grouping  scheme  is  based  primarily  on  the  ideas 
of  Curtis,  Powell,  and  Reid  (1974)  as  expanded  and  improved  by  Coleman  and  Mor6 
(1983).  A  FORTRAN  code  found  in  Coleman  and  Mor6  (1982)  furnished  our  first  pass  par¬ 
titioning  the  columns  of  A  into  disjoint  groups.  We  will  refer  to  this  work  as  'CM1.  In  Sec¬ 
tion  1,  we  argued  that  there  could  be  some  advantages  to  allowing  the  groups  to  overlap  in 
some  columns.  In  our  tests,  we  used  the  following  heuristic  to  expand  each  group  in  turn. 
To  expand  a  given  group,  we  first  mark  all  columns  that  have  a  nonzero  element  in  the 
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same  row  position  as  some  column  in  the  group.  This  identifies  the  columns  that  can  not  be 
added  to  the  group.  We  then  add  one  unmarked  column  to  the  group  and  add  to  the  set  of 
marked  columns  all  columns  that  have  a  nonzero  row  element  in  the  same  position  as  the 
column  that  was  added  to  the  group.  This  process  is  then  repeated  until  no  columns  are 
left  unmarked. 

Finally,  let  Ai  denote  a  resulting  submatrix  of  columns  dj  of  A,  j£lit  then  AjAi  is  a  diag¬ 
onal  matrix  where  the  diagonal  elements  are  the  squares  of  the  ^-norms  of  the  columns,  so 
Ai  has  full  rank,  as  we  required  in  Theorem  2.1.  As  an  example,  we  present  in  Table  1  the 
results  of  this  scheme  applied  to  Problem  3. 


Group 

Number  of  Columns 
CM  Expanded 

1 

25 

25 

2 

25 

25 

3 

25 

25 

4 

20 

23 

5 

8 

25 

6 

1 

25 

Table  1:  Groups  in  Problem  3 

We  note  that  when  the  groups  are  expanded,  for  the  last  groups  the  increase  in  number  of 
columns  is  larger  than  for  the  first  few.  This  is  to  be  expected  for  most  sparsity  structures 
by  the  way  the  methods  of  partitioning  the  columns  work. 

We  also  considered  an  expansion  of  the  groups  of  columns  beyond  mutual  orthogonal¬ 
ity  to  the  case  where  the  normal  equations  for  the  least-squares  subproblems  are  banded. 
In  particular,  we  used  the  following  sequential  heuristic  algorithm  to  group  the  columns  so 
that  A{Ai  is  tridiagonal.  Initially,  the  columns  are  ordered  according  to  some  criteria  like 
the  incidence  degree  ordering.  Choose  the  first  column,  mark  it,  and  let  all  other  columns 
be  unmarked.  This  will  be  our  first  column  in  the  group.  Choose  the  first  of  the  unmarked 
columns  that  have  a  nonzero  element  in  a  same  row  position  as  the  first  column  and  mark 
all  columns  that  have  an  element  in  any  same  rowposition  as  the  first  column.  This  new 
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column  is  our  next  column  in  our  group.  This  process  is  now  repeated  until  a  column  has 
no  unmarked  columns  with  an  element  in  any  same  rowposition.  At  this  point,  either  all 
columns  are  marked  or  there  are  an  unmarked  column.  Choose  the  first  unmarked  column 
and  continue  the  process  until  all  columns  are  marked.  We  have  now  generated  one  group 
of  columns  so  that  the  the  normal  matrix  A+Ai  is  block  tridiagonal.  Columns  in  different 
blocks  in  the  same  group  are  orthogonal.  Unmark  all  columns  except  the  columns  already 
in  a  group.  Repeat  the  process  by  chosing  the  first  unmarked  column.  We  illustrate  this 
grouping  strategy  on  Problem  3. 


Group 

Number  of 
columns  blocks 

1 

45 

5 

2 

41 

9 

3 

17 

11 

4 

1 

1 

Table  2:  Groups  in  Problem  3 


4.3.  Storage  Requirements 

It  is  of  interest  to  compare  the  storage  requirements  of  the  algorithm  of  Section  2 
applied  directly  to  these  problems  to  the  requirements  of  a  very  good  package  for  sparse 
symmetric  and  positive-definite  systems  applied  to  the  normal  equations.  In  the  following 
table,  columns  A  and  ATA  give  the  storage  required  for  the  real  nonzero  elements  in  A  and 
the  lower  triangular  part  of  A7 A,  as  well  as  the  associated  integer  pointers  when  we  use  the 
storage  scheme  of  the  Harwell  testbed.  Column  L  gives  the  storage  requirements  for  the 
Yale  Sparse  Matrix  Package  (YSMP)  [Eisenstat  et  al  (1982)]  to  store  the  lower  triangular 
factor  of  AtA. 
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Storage  Requirement 

Problem 

m 

n 

A 

ATA 

L 

Real 

Int 

Real 

Int 

Real 

Int 

219 

85 

438 

524 

304 

390 

520 

642 

958 

292 

1916 

2209 

1250 

1543 

2568 

2497 

.Ka 

331 

104 

662 

767 

435 

540 

774 

812 

608 

188 

1216 

1405 

796 

985 

1625 

1609 

313 

176 

1557 

1734 

1485 

1662 

1593 

1210 

Table  3:  Storage 


For  our  scheme,  if  A(Ai ,  i=  1,2,  •  ■  ■  ,g  are  diagonal  matrices,  we  do  not  need  the  vec¬ 
tor  4  explicitly.  Instead,  when  we  compute  the  components  of  4?V*+(1-1)/0,  we  also  compute 
the  components  of  sf  and  accumulate  the  innerproduct  (A[rk+(t~1)/g)Ts£.  Hence  the  only 
storage  that  is  needed  is  the  original  data  A,  and  6  (overwritten  by  rt+i/9),  and  the  solution 
vector  x  plus  some  additional  pointer  storage  for  the  groups.  If  A(Ai  ,  i=l,2,  •  •  •  ,g  are  tridi¬ 
agonal,  then  we  need  the  LDLT  factorization  of  the  tridiagonal  matrices  and  the  vector  . 
For  the  inexact  Gauss  Newton  method,  we  need  to  store  the  Jacobian  matrix.  For  the  non¬ 
linear  substitution  method,  we  need  only  one  extra  vector  of  length  m  if  the  columns  of  the 
Jacobian  matrix  or  its  approximant  in  each  group  have  no  elements  in  the  same  rowposi- 
tions. 


4.4.  Numerical  Experiments 

Now  we  briefly  discuss  the  termination  criteria  that  we  use.  From  the  definition  of 
rt+i/J  and  the  choice  of  s*.  we  have 

l|rfc+i/ffll|=  ||rt+(*“1)/,|||  4-  2cji(4[ri+(1-1)/J)7'si  +  «?(4jUiSi)rs£ 

=  ||r*+(i_1)/9|||  +  Wi(2 -«i)(iliV+(4~lV')rsf. 

The  major  work  required  to  calculate  the  l2-norm  of  the  residual  is  an  extra  innerproduct 
since  is  already  computed  as  in  Subsection  4.3.  Since  we  want  to  compare 

different  algorithms,  we  need  to  base  our  stopping  criteria  on  a  monotonically  decreasing 
sequence.  This  suggests  the  following  termination  rule: 
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llr^lla 

Hi*!  la 


g  £. 


(4.1) 


In  the  inexact  Gauss  Newton  method,  we  base  the  termination  rule  on  the  residual  Att  in 
the  normal  equations.  We  note  that  this  costs  one  matrix-vector  product  per  iteration.  We 
terminate  the  inner  loop  cycle  when  (3.3)  holds.  For  the  nonlinear  problems,  the  outer  loop 
is  terminated  when 


£  e.  (4.2) 

As  explained  above,  one  grouping  scheme  begins  by  using  CM  graph  coloring  to  parti¬ 
tion  the  columns  of  A,  and  then  we  use  the  heuristic  strategy  to  expand  the  groups. 
Further,  to  terminate  the  iteration  to  solve  the  linear  problem,  we  use  (4.1).  Table  4  com¬ 
pares  the  CM  grouping  to  the  expanded  groups  that  ‘overlap’.  The  entries  in  the  tables 
are:  in  the  column  marked  ‘it’  for  iterations,  the  numbers  kg+i  in  the  notation  from  Sec¬ 
tion  2  of  diagonal  least-squares  problems  solved;  we  also  include  in  the  column  marked 
‘vup’  the  total  number  of  variables  that  were  updated.  Since  the  block  matrix  Afa  is  diago¬ 
nal,  the  CM  grouping  is  an  point  SOR  using  the  CM  grouping  as  the  ordering.  The  number 
of  variables  updated  is  therefore  the  number  of  point  SOR  corrections.  In  Table  4,  we 
choose  Wj=l. 


\\F(^)\\ 

ILF(z°)l! 


Number  of  least  squares  problems  solved 

Problem 

g 

£  = 

1 

£  = 

01 

£  = 

001 

Overlap 

CM 

Overlap 

CM 

Overlap 

CM 

it 

vup 

it 

vup 

it 

vup 

it 

vup 

it 

vup 

it 

vup 

1 

4 

8 

188 

8 

170 

19 

449 

19 

408 

35 

825 

35 

727 

2 

6 

9 

633 

9 

505 

22 

1524 

23 

1166 

39 

2713 

41 

2042 

3 

6 

9 

223 

9 

179 

28 

690 

28 

511 

86 

2122 

86 

1506 

4 

6 

9 

411 

9 

321 

24 

1092 

31 

987 

52 

2362 

68 

2160 

5 

10 

33 

703 

33 

602 

123 

2584 

123 

2186 

242 

5051 

235 

4160 

Table  4:  Overlap  vs  CM. 


Table  4  indicates  a  fast  decrease  in  the  residual  in  the  first  few  iterations.  This  can  be 
explained  from  the  observation  that  the  iterative  process  is  somewhat  related  to  coordinate 
search  for  the  least-squares  problem,  where  the  spans  of  the  A±  act  like  coordinates.  Notice 
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further,  there  is  basically  no  difference  in  Table  4,  where  w  =  l,  between  partitioning  the 
columns  and  allowing  overlaps  if  we  count  the  number  of  iterations.  We  see  a  bigger 
difference  between  overlapping  and  partitioning  for  w#  1  than  for  cj=1.  Perhaps  this  can  be 
explained  from  the  observation  that  if  the  column  at  of  A  is  in  group  i  and  i+1,  then  for 
cj=1,  component  t  of  ^+i  is  0,  but  for  cj^I,  this  component  can  be  nonzero.  However,  in 
terms  of  equivalent  point  SOR  (or  point  Gauss  Seidel)  corrections,  we  see  that  for  the 
linear  problem  it  does  not  pay  to  expand  the  groups.  On  the  other  hand,  in  the  following 
three  sets  of  results  for  the  nonlinear  problems  and  the  nonlinear  substitution  technique, 
we  see  that  overlap  may  be  more  efficient  in  terms  of  fewer  iterations  and  function  calls. 

Of  course,  this  is  hardly  surprising,  but  the  extra  function  calls  used  by  nonlinear  substitu¬ 
tion  make  it  less  attractive  than  the  inexact  Gauss-Newton  method,  unless  storage  is  the 
main  concern. 


Problem  1 


£=.l  £  =  .01  £  =  .001 

CM  ordering 

Nonlinear  substitution 
Inexact  Gauss  Newton 

19/9 

11/19/2 

45/22 

21/51/4 

1  I 

Overlap 

Nonlinear  substitution 
Inexact  Gauss  Newton 

19/9 

11/19/2 

45/22 

21/51/4 

81/40 

26/70/5 

Problem  2 


£  =  .l  £  =  .01  £=.001 

CM  ordering 

Nonlinear  substitution 
Inexact  Gauss  Newton 

23/11 

15/24/2 

63/31 

22/44/3 

105/52 

36/90/5 

Overlap 

Nonlinear  substitution 
Inexact  Gauss  Newton 

23/11 

15/22/2 

57/28 

29/61/4 

103/51 

36/84/4 

Problem  3 
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£  =  .l  £  =  .01  £  =  .001 

CM  ordering 

Nonlinear  substitution 
Inexact  Gauss  Newton 

27/13 

15/23/2 

57/28 

29/85/4 

115/57 

36/84/5 

Overlap 

Nonlinear  substitution 
Inexact  Gauss  Newton 

23/11 

15/22/2 

55/27 

29/84/4 

f/i/o  ,f=number  of  function  call,  i=number  of  iterations,  o=number  of  outer  iterations 


Table  5:  Results  for  Nonlinear  Test  Problems 


In  the  following  tables,  we  compare  the  CM  grouping  strategy  and  a  grouping  strategy 
so  that  the  corresponding  block  in  the  normal  equations  is  block  tridiagonal.  The  strategies 
are  compared  to  point  SOR  with  the  original  ordering.  The  entries  for  the  grouping 
schemes  are  the  numbers  of  variables  updated  to  achieve  the  specified  accuracy.  For  point 
SOR  with  the  original  ordering,  they  are  the  numbers  of  variable  updates  needed  to  achieve 
the  same  accuracy  as  the  grouping  scheme.  The  arithmetic  needed  by  CM  and  point  SOR 
with  original  ordering  has  the  same  cost.  Naturally,  the  tridiagonal  case  costs  more  per 
variable  update.  However,  the  dominating  cost  for  all  the  methods  is  the  two  matrix-vector 
products  for  each  sweep  through  all  the  columns  .  The  final  line  in  the  tables  is  the  relative 
efficiency  in  point  SOR  corrections  of  the  two  grouping  strategies,  i.e.  for  Problem  1  with 
«  =  1.1  and  £  =  .l  it  is  (191/162)/(  17 1/ 153)  =  1 .05. 
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Point  SOR  with  CM  Ordering  vs.  Point  SOR  with  Original  Column  Ordering 

and 

Tridiagonal  Blocks  vs.  Point  SOR  with  Original  Ordering 


Problem  1 


£  =  .l 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

281 

217 

196 

170 

153 

170 

196 

238 

302 

387 

557 

Point  SOR 

304 

238 

223 

210 

171 

210 

237 

275 

346 

434 

596 

Tridiagonal 

255 

247 

210 

170 

162 

162 

210 

247 

295 

417 

587 

Point  SOR 

296 

268 

238 

214 

191 

208 

269 

288 

346 

463 

625 

Rel.  eff. 

1.07 

.99 

1.0 

1.02 

1.05 

1.04 

1.06 

1.01 

1.02 

.99 

1.0 

Problem  2 


£  =  .l 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

797 

584 

559 

505 

505 

559 

658 

851 

1089 

1381 

1965 

Point  SOR 

802 

634 

568 

552 

560 

664 

807 

985 

1234 

1535 

2119 

Tridiagonal 

814 

584 

577 

522 

522 

577 

705 

814 

1106 

1398 

1982 

Point  SOR 

828 

664 

630 

569 

613 

747 

848 

975 

1277 

1557 

2164 

Rel.  eff. 

1.01 

1.05 

1.12 

1.0 

1.06 

1.09 

.98 

1.03 

1.02 

1.00 

1.01 

Problem  3 


£  =  .l 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

283 

233 

199 

179 

179 

207 

233 

303 

387 

491 

699 

Point  SOR 

271 

233 

197 

198 

202 

230 

278 

329 

426 

537 

745 

Tridiagonai 

294 

253 

190 

190 

190 

190 

253 

294 

398 

502 

710 

Point  SOR 

297 

262 

198 

205 

232 

222 

305 

322 

436 

540 

757 

Rel.  eff. 

1.05 

1.04 

1.05 

.98 

1.08 

1.05 

1.01 

1.01 

1.0 

.98 

1.00 
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Problem  4 


£  =  .l 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

548 

423 

360 

321 

321 

360 

468 

548 

697 

924 

1261 

Point  SOR 

521 

431 

341 

339 

360 

387 

529 

612 

762 

984 

1322 

Tridiagonal 

529 

453 

373 

341 

341 

373 

453 

529 

717 

905 

1281 

Point  SOR 

513 

474 

365 

343 

360 

461 

513 

597 

799 

988 

1360 

Rel.  eff. 

1.02 

1.03 

1.03 

.95 

.94 

1.15 

1.00 

1.01 

1.02 

1.03 

1.01 

Problem  5 


£  =  .l 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

876 

778 

667 

602 

602 

602 

602 

654 

778 

973 

Point  SOR 

802 

736 

614 

583 

589 

614 

633 

682 

816 

1016 

1351 

Tridiagonal 

824 

704 

621 

562 

562 

562 

562 

648 

738 

973 

1294 

Point  SOR 

805 

715 

612 

583 

588 

613 

631 

700 

814 

1028 

■EH«a 

Rel.  eff. 

1.07 

1.07 

1.07 

1.07 

1.07 

1.07 

1.07 

1.04 

1.05 

1.01 

Mufti 
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In  the  next  group  of  tables,  we  tried  point  SOR  in  every  case,  but  the  orderings  used 
were  the  orderings  suggested  by  the  column  grouping  schemes  discussed  previously.  Except 
for  Problem  5,  it  appears  as  though  the  alternate  orderings  are  very  good  ones.  The  rows 
labeled  ‘Point  SOR’  correspond  to  the  original  ordering. 

Point  SOR  with  CM  Ordering  vs.  Point  SOR  with  Original  Ordering 

and 

Point  SOR  with  Tridiagonal  Ordering  vs.  Point  SOR  with  Original  Ordering 


Problem  1 


£  =  .001 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

1598 

1258 

961 

727 

557 

493 

536 

680 

897 

1173 

1683 

Point  SOR 

1870 

1530 

1273 

1020 

850 

680 

610 

762 

977 

1277 

1747 

Tri 

1400 

1145 

935 

765 

635 

465 

550 

672 

890 

1182 

1692 

Point  SOR 

1906 

1530 

1273 

1020 

903 

735 

677 

777 

982 

1281 

1756 

Problem  2 


(*) 

II 

O 

o 
» — » 

e 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

4162 

3286 

2603 

2042 

1606 

1534 

1898 

2410 

3066 

4162 

5815 

Point  SOR 

4381 

3505 

2886 

2311 

1850 

1875 

2268 

2751 

3377 

4389 

6009 

Tri 

3734 

3041 

2457 

1982 

1581 

1398 

1873 

2329 

3041 

4026 

5778 

Point  SOR 

4360 

3563 

2899 

2317 

2020 

1783 

2299 

2743 

3448 

4344 

6068 

Problem  3 


£  =  .001 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

2175 

1947 

1714 

1506 

1298 

1115 

927 

832 

1090 

1447 

2071 

Point  SOR 

1762 

1559 

1450 

1247 

1141 

1034 

926 

939 

1224 

1560 

2184 

Tri 

1438 

1230 

1039 

918 

773 

669 

669 

831 

1085 

1438 

2062 

Point  SOR 

1763 

1560 

1352 

1349 

1142 

1038 

933 

972 

1224 

1560 

2186 
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Problem  4 


£  =  .001 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

3476 

2992 

2536 

2160 

1864 

1488 

1220 

1551 

1972 

2631 

3744 

Point  SOR 

2821 

2316 

1881 

1676 

1496 

1308 

1354 

1725 

2138 

2797 

3911 

Tri 

2521 

2145 

1769 

1469 

1205 

1017 

1205 

1501 

1957 

2629 

3725 

Point  SOR 

2823 

2388 

1960 

1685 

1488 

1309 

1448 

1709 

2159 

2851 

3891 

Problem  5 


£  =  .001 

CJ 

.7 

.8 

.9 

1.0 

1.1 

1.2 

1.3 

1.4 

1.5 

1.6 

1.7 

CM 

7602 

6156 

5040 

4160 

3418 

2834 

2306 

2268 

2674 

3418 

4688 

Point  SOR 

7126 

5744 

4811 

4133 

3632 

3456 

3280 

3095 

2959 

3663 

4861 

Tri 

8306 

7384 

6574 

5842 

5138 

4434 

3758 

3112 

2702 

3464 

4962 

Point  SOR 

7125 

5744 

4812 

4133 

3684 

3456 

3280 

3095 

2959 

3663 

4855 

24 


References 

A.  Bjork  and  T.  Elfving  (1979),  Accelerated  projection  methods  for  computing  pseudoin¬ 
verse  solutions  of  systems  of  linear  equations,  BIT,  19,  pp.  145-163. 

T.F. Coleman  (1984),  Large  Sparse  Numerical  Optimization,  Lecture  Notes  in  Computer 
Science  165,  Springer- Verlag,  Berlin. 

T.F. Coleman  and  J.J.Mor6  (1982),  Software  for  estimating  sparse  Jacobian  matrices,  Cor¬ 
nell  Computer  Science  TR  82-502. 

T.F.Coleman  and  J.J.Mor6  (1983),  Estimation  of  sparse  Jacobian  matrices  and  graph  color¬ 
ing  problems,  SIAM  J.  Numer.  Anal.  20,  pp.  187-209. 

A.  R.  Curtis,  M.  J.  D.  Powell,  and  J.  K.  Reid  (1974),  On  the  estimation  of  sparse  Jacobian 
matrices,  J.  Inst.  Math.  Appl.,  13,  pp.  117-119. 

R.  S.  Dembo,  S.  C.  Eisenstat,  and  T.  Steihaug  (1982),  Inexact  Newton  methods,  SIAM  J.  of 

Numer.  AnaL,  19,  pp. 400-408. 

J.  E.  Dennis  (1977),  Nonlinear  least  squares  and  equations,  The  State  of  the  Art  in  Numeri¬ 
cal  Analysis,  D.  Jacobs,  ed.,  Academic  Press,  London,  pp. 269-312. 

J.  E.  Dennis  and  R.  B.  Schnabel  (1979),  Least  change  secant  updates  for  quasi-Newton 
methods,  SIAM  Rev.,  21,  pp. 443-459. 

I. S.  Duff  and  J.  K.  Reid  (1979),  Performance  Evaluation  of  Codes  for  sparse  matrices, in 

Performance  Evaluation  of  Numerical  Software,  L.D.Fosdick  (ed.)  North 
Holland  Publishing  Company,  pp.  121-135. 

S. C.  Eisenstat,  M.C.  Gursky,  M.H.  Schultz,  A.H.  Sherman  (1982),  Yale  Sparse  Matrix 

Package  I:  The  symmetric  Codes,  International  Journal  for  Numerical 
Methods  in  Engineering,  18,  pp. 1141-1151. 

T.  Elfving  (1980),  Block  iterative  methods  for  consistent  and  inconsistent  linear  systems, 

Numer.  Math.,  35,  pp.1-12. 

A.  S.  Householder  (1964),  The  Theory  of  Matrices  in  Numerical  Analysis,  Blaisdell  Publish¬ 
ing  Company,  New  York. 

A.  S.  Householder  and  F.  L.  Bauer  (1960),  On  certain  iterative  methods  for  solving  linear 
systems,  Numer.  Math.,  2,  pp. 55-59. 

H.  B.  Keller  (1965),  On  the  solution  of  singular  and  semidefinite  linear  systems  by  itera¬ 
tion,  SIAM  J.  of  Numer.  Math.,  2,  pp. 281-290. 

J.  M.  Ortega  (1972),  Numerical  Analysis:  A  Second  Course,  Academic  Press,  New  York. 

J.K.Reid  (1973),  Least  squares  solution  of  sparse  systems  of  non-linear  equations  by  a 

modified  Marquardt  algorithm,  in  Proc.  NATO  Conf.  at  Cambridge,  July  1972, 
North  Holland,  Amsterdam,  pp. 437-445. 

T.  Steihaug  (1980),  QiLasi-Newton  Methods  for  Large  Scale  Nonlinear  Problems,  Ph.D  disser¬ 
tation,  SOM  Technical  Report  #49,  Yale  University. 


25 

G.  W.  Stewart  (1973),  Introduction  to  Matrix  Computations,  Academic  Press,  New  York. 

D.  M.  Young  (1971),  Iterative  Solution  of  Large  Linear  Systems,  Academic  Press,  New 
York. 


